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The benefits of diversifying risks are difficult to estimate quantitatively because of the 
uncertainties in the dependence structure between the risks. Also, the modelling of multidi- 
mensional dependencies is a non-trivial task. This paper focuses on one such technique for 
portfolio aggregation, namely the aggregation of risks within trees, where dependencies are 
set at each step of the aggregation with the help of some copulas. We define rigorously this 
procedure and then study extensively the Gaussian Tree of quite arbitrary size and shape, 
where individual risks are normal, and where the Gaussian copula is used. We derive exact 
analytical results for the diversification benefit of the Gaussian tree as a function of its shape 
and of the dependency parameters. 

Such a "toy-model" of an aggregation tree enables one to understand the basic phenomena's 
at play while aggregating risks in this way. In particular, it is shown that, for a fixed number 
of individual risks, "thin" trees diversify better than "fat" trees. Related to this, it is shown 
that hierarchical trees have the natural tendency to lower the overall dependency with respect 
to the dependency parameter chosen at each step of the aggregation. We also show that these 
results hold in more general cases outside the gaussian world, and apply notably to more 
realistic portfolios (LogNormal trees). We believe that any insurer or reinsurer using such 
a tool should be aware of these systematic effects, and that this awareness should strongly 
call for designing trees that adequately fit the business. 

We finally address the issue of specifying the full joint distribution between the risks. We 
show that the hierarchical mechanism does not require nor specify the joint distribution, 
but that the latter can be determined exactly (in the Gaussian case) by adding conditional 
independence hypotheses between the risks and their sums. 
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INTRODUCTION 



In this introduction, we first review in Sections 11 A| and IB some basics about diversification 



benefit, dependencies, risk measures and portfolio aggregation and the interplay between these 



notions from a Quantitative Risk Management perspective. In Sections |I C| and I D we then 



present hierarchical aggregation trees and finally discuss the plan of the paper in Section I E 



A. Diversification benefit and dependencies 

The core business of insurers and reinsurers is taking risks. Their ability to survive and to grow 
therefore critically depends on their capacity to diversify these risks, both on the liability and 
on the asset sides. Such a commonplace sadly gets a new flavor in these days as the financial 
industry is somehow discovering that some assets which have been considered for a long time 
as being risk-free -government bonds- might not, after all, be so secure investments. 

Although the present paper does not deal with this critical issue, we believe that this very un- 
comfortable revolution for financial industry calls for having at hand robust and well- understood 
risk management tools. In these rather uncertain days, we think it is important to understand 
better how, and to which quantitative extent, financial industries benefit from the diversification 
of their portfolios. 

The benefit of diversification decreases for correlated risks. The question of modelling accurately 
the diversification thus amounts to model adequately the dependencies between the risks. This is 
a very challenging task. Indeed, although the individual risks constituting the portfolio might 
easily be described with an appropriate stochastic model derived from data and/or expert 
opinion, it is often the case that very few joint observations are available, in which case the 
joint distribution of the individual risks is basically unknown. In this respect, the risks driven 
by extreme events are particularly relevant^, because then both marginal and joint observations 
are scarce [1] |2]. Moreover extreme risks have the tendency to correlate in the tail for which 
joint observations become extremely rare [3]. 

As a consequence, it is in general non-trivial to get a picture of the dependency structure 
between the individual risks. In the recent years, copulas [5] have become the privileged tool 
to overcome this difficulty. In a word, copulas are multivariate functions that allow separating 



In particular to reinsurance companies which underwrite excess of loss contracts. 
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the dependence structure from the margins. Copulas can then be used to define a dependence 
structure between margins, taking into account, in particular, the tail-dependencies. Moreover, 
there exist asymmetric copulas (such as the Clayton copula) which can reflect the asymmetry 
in the dependence between actual risks (risks which are correlated only in the tail) [6]. 

The present paper deals with the construction of a dependency structure based on copulas, via 
a hierarchical aggregation of risks. The diversification then becomes a function of the copulas 
used and also depends on the details of the hierarchical aggregation scheme. 



B. Diversification, risk measures and portfolio aggregation 

In order to compute the diversification benefit, it is necessary to have a measure of the risk 
carried by the full portfolio: the sum at risk. Then, the diversification benefit is measured as 
the ratio between the sum at risk of the actual portfolio to a fictive sum at risk in the case 



where the risks are fully dependent from each other (for a precise definition see Section III). 

A widely used sum at risk is the Value-At-Risk (VaR) at some threshold a, typically a = 0.01. 
More generally, the sum at risk is defined through a risk measure j4| . In this paper we will rather 
use the expected shortfall at some threshold a, ESq,, also called the Tail- Value- At- Risk or TVaR. 
This risk measure has been shown to be more satisfying than the VaR from a mathematical 
perspective, see |4jL7j. 

Because risk measures are not linear in their arguments however, TVaR(SXj) ^ I]TVaR(Xj), 
it is necessary to know the distribution of the total portfolio Z = ^ Xj in order to compute the 
diversification benefit. This is what portfolio aggregation, or risk aggregation refers to: Portfolio 
aggregation is the computation of the distribution oi Z = T,Xi, given the marginals Xj and a 
dependence structure (more on this in Section [lH). 

Although we focus on diversification in this paper, note that there are other motivations than 
pure risk management for aggregating risks together, such as e.g. capital management, capital 
allocation, business steering, and profitability analysis j6]. 



C. Hierarchical aggregation of correlated risks 



Since one has only a limited knowledge about the dependency between the risks, the usual 
realistic framework for risk aggregation is to consider that only the distributions of the individual 
risks are available, but nothing or little on the joint distribution is known. Then one might 
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aggregate the risks while assuming independence between them. This would obviously lead to a 
very poor estimate for the distribution of Z and to a large overestimation of the diversification 
benefit. One might instead try to quantify the uncertainty on the sum at risk originating 
from the lack of information, via, e.g. computation of bounds on the sum at risk. Recently a 
general framework which interpolates between marginal knowledge and full knowledge of the 
joint distribution has been introduced along these lines in [8]. More generally, the question of 
aggregating correlated risks has attracted lot of attention in the past years (for a short review 
see e.g. [9]). 

In the present paper we follow yet another route, where we assume enough information to be able 
to compute the total portfolio and the sum at risk (and hence the diversification) but without 
assuming that all the joint distribution between the individual risks is known. Moreover, this 
information is meant to be information actually attainable through actuarial work and expert 
opinion. 

The basic idea is a hierarchical aggregation of risks, which can be represented graphically as a 
tree of aggregation (see Section|ll|. To our knowledge, the idea was first presented in [lUI Chp.8] 



and [6j. The present article extends these works in many respects, see Sections ID and |I E[ 

The idea is a step-by-step aggregation, where one first aggregates together risks within N' 
disjoint subsets of the set of all the individual risks, by noticing that risks in the same subset 
share some common features (e.g. lines of business, region of the world, etc.). It makes then 
sense from an actuarial point of view to tie these risks together, i.e. to assume some dependency 
between these risks. The dependency is set via some suitably chosen and calibrated copula [I]^, 
which specifies the joint distribution between these risks. As a consequence, the respective N' 
partial sums of the individual risks are then calculable. 

As a result of this first aggregation step, one ends up with partial sums of the individual risks. 
The procedure can then be repeated at upper levels, by noticing some other common features 
or dependencies between these partial sums, and tying them together via another set of copulas. 



The process will be illustrated in Section I D below and defined more rigorously in Section [IT] 
At the end of the process, one gets a random variable representing the total portfolio, the 
distribution function of which is not only a function of the individual risks, but also of the 
dependencies that have been chosen. 

When compared to the naive method of taking into account dependencies via a correlation 
matrix between the individual risks, the above hierarchical aggregation has several major ad- 
vantages. First, it is based on copulas which are well-known to be more realistic measure of 
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dependence (especially in the tail) than simple linear correlations. Second, and more impor- 
tantly, the estimation of linear correlation coefficients is impossible in practice for large and 
realistic portfolios, because the number of unknowns goes as N{N — l)/2 for N individual risks, 
whereas N can be as large as ~ 1000 for worldwide companies. In an hierarchical tree of ag- 
gregation however, the number of parameters can be considerably reduced (see next sections), 
and it becomes feasible to calibrate them in a realistic way fH |6] . 

Note that one could have also chosen a direct copula-based aggregation (i.e. a flat tree, see 
Section by imposing directly some explicit A^— dimensional copula between the individual 
risks. This would specify fully the joint distribution and allow in particular computing the total 
portfolio. As it is well-known however, it is very difficult to construct multivariate copulas, and 
only some families of them are known [H [5]. In particular, one would not be able to find (or 
even look for) a suitably defined A^— dimensional copula that would reflect adequately the actual 
dependencies between all the individual risks. Indeed, it is important to realize that within all 
the individual risks, there might be both very dependent risks (e.g. wind France and wind 
Germany) but also uncorrelated risks (e.g. wind France and medical malpractice in Japan). 
Defining a A^— dimensional copula given a set of such requirements would largely go beyond the 
current knowledge about copulas. 

D. Aggregation trees and their topology 

We saw that the use of hierarchical trees for portfolio aggregation has many advantages with 
respect to other methods. In short, it enables a construction of a viable dependency structure 
between the individual risks. Such a construction is realistic in the sense that it is based on 
copulas and requires only few information or parameters. It therefore solves (in principle) the 
difficult problem of risk aggregation, whereas its application is doable in practice (and indeed 
used by some companies). 

The great reduction of free parameters that we discussed should however not be misunderstood. 
It would indeed be misleading to think that the only freedom left in defining an aggregation 
tree lies in the calibration of the dependencies and the type of copulas used. In fact a lot of 
freedom has simply been hidden in the order in which the risks are aggregated together and 
in the general shape of the tree. Loosely speaking thus, the "topology" of the tree has a direct 
impact on the final distribution function, and therefore on the sum at risk carried by the full 
portfolio. 
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The present paper will essentially focus on the role of the shape of the tree, not on the role 
of the order of aggregation. Let us however simply illustrate the latter on a concrete example. 
Consider four assets, two government bonds in countries 1 and 2, and respectively two stock 
market portfolios in countries 1 and 2. 

One might then consider the following aggregation tree: 




Here SR stands for "Sovereign Debt Risk" (by definition, given by the sum of Bl and B2) and 
MR stands for "Market Risk" {MR = SI + S2). Also Bl stands for "Bonds country 1", and SI 
for "Stocks country 1". It is clear however that one might also consider a different tree: 




where now RCl stands for "Risk of Country 1". The difference between these two trees is simply 
the permutation {B2 -H- SI), and possibly also the copulas used. In other words, the order of 
aggregation is different. Although these two trees aggregate the same individual risks with 
given marginals, they will not lead to the same distribution function for the total portfolio in 
general, except in trivial cases, e.g. if the identity copula is used. In particular, the sum at risk, 
and therefore the computed diversification benefit will differ. 

This is both an advantage and a drawback of the method. It is a drawback in the sense that, 
even in such a simple case, an intrinsic ambiguity of the method arises. It is an advantage 
however, because this freedom in aggregating together the risks can be settled by actuarial 
experts with the aim of staying as close as possible to the business in the modelling choices. In 
the above Active example, the choice between the two trees largely depends on whether the two 
countries 1 and 2 have strongly dependent economy (e.g. France and Germany) in which case 
the first tree is relevant or, in the opposite, if they have almost independent economy (then the 
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second tree is relevant, because it models first regional risks, that are then connected via the 
last aggregation step, i.e. via the dependence on worldwide economy). 

One of the conclusion is therefore that, as the world and the portfolios evolve with time, one 
should not only focus on regularly updating the dependencies parameters, but shall also be 
concerned with updating the structure of the tree itself and its adequacy with the risk profile 
of the company. 



E. Plan of the paper 

Summarizing this introduction, we wish to address in this paper the issue of the aggregation of 
correlated risks and the impact of the modelling choices on the diversification benefit. There 
are various methods to do so. The present paper focuses on a particular method referred to as 
the hierarchical aggregation of risks, following the ideas found in |6l I10| . 

The aim of this paper is twofold. First, we wish to provide a clean definition of the aggrega- 
tion process described in this introduction, explain the rationale behind the graphical (tree) 
representation, and investigate the links with portfolio aggregation and incomplete information 
of the full risk profile (i.e. of the joint distribution). This will be done in Section |ll] (see also 



Section VI) 



Second, the main motivation is to study the influence of the structure -or shape- of the tree 
on the diversification benefit. As far as we are aware, this has not been studied before. We 



focus first on Gaussian Trees (to be defined in Section IV). To get rid of the non-commutativity 
effect illustrated in Section [l D[ we modeled every individual risks by the same distribution (a 
Gaussian), and we moreover applied the same (Gaussian) copula at every aggregation steps. 
Such a tree, that we dubbed "Regular Gaussian Tree" is admittedly of little phenomenological 
relevance because insurance risks usually have more fat tail distributions than the Gaussian. 
Also, realistic trees have no regular shape in general, and finally the Gaussian copula does not 
bring enough tail dependency between risks (see |11)). 

However, the Gaussian Tree has the major advantage of being completely solvable, and provide 
exact analytical -and very enlightening- results for the behaviour of the diversification benefit 



as a function of the shape of the tree (see Section IV ) . One of the main conclusion is that "thin" 
trees diversify better than "fat" trees, meaning that the width and the depth of the tree is quite 
of importance regarding the diversification benefit, for a given fixed number of individual risks 
aggregated together. 
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In particular, the results on the Gaussian tree naturally explains why one might still get quite 
an high diversification benefit although the dependencies set at each aggregation steps are high, 
provided the tree is thin enough. The explanation is rather intuitive, and amounts to the 
following: two individual risks which are far away from each other in the tree are connected via 
a node at the top of the tree, and therefore via the application of several times the Gaussian 
copula. This has the effect of lowering (possibly drastically) their effective coupling, ie. their 
correlation coefficient. This again makes sense from a business perspective (provided the tree is 
constructed in such a way that it fits the business), as there are no reasons why very different 
line of business like, say, Japan Earthquake and Medical Malpractice in UK should be anyhow 
connected to each other. 

In order to go beyond the Gaussian model, we also provide similar numerical results for the 
LogNormal Trees and for various choices of copulas (see Section|V|. We show that the behaviour 
seen in the Gaussian tree still holds in more realistic and more complex situations for which an 
analytical treatment is out of reach. In other words, and anticipating one of our conclusions, 
the Gaussian tree imposes itself as the natural benchmark to which any other trees should be 
compared. 

Section |VI| finally goes back to the incomplete information problem. We show that the hier- 
archical aggregation procedure discussed here can lead to a completely specified model (full 
knowledge of the joint distribution) whenever some additional conditions (conditional indepen- 
dence statements) between the risks are added. 



II. COPULA-BASED HIERARCHICAL AGGREGATION OF CORRELATED RISKS 



In this section we describe the minimal mechanism that enables one to formally compute the 
distribution function of the total portfolio in such a way that it fits with the idea of hierarchical 
aggregation described shortly in introduction. It is minimal in the sense that no additional 
hypotheses are needed to compute the total portfolio. In particular, the method below does not 
require the knowledge of the full joint distribution of the individual risks, nor does it specify it. 



The last section of the paper (Section VI) will discuss how extra conditions can be added to 
this minimal construction in order to define completely the joint distribution. As far as the 
diversification benefit is concerned, however, the distribution of the total portfolio is enough 
information and we do not need any additional knowledge on the joint distribution. 
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A. Top down point of view: tree structure and summation decomposition 

Let Xi be the individual risks (hereafter also called the leaves of the tree) , and let Z = ^ Xi 
be the total portfolio. We also denote with a bold face symbol the vector formed by these 
individual risks: X = {Xi, . . . , Xn)- 

The sum at risk associated to Z is obtained by applying some risk measure on its distribution. 
Equivalently, and more conveniently for the present analysis, the distribution of Z is equally 
well given by its characteristic function: 



$z(i) = E[exp{itZ)] = E 



exp 



(II.l) 



There are different ways of computing it. Were the joint distribution of the Xj's known, one 
could simply use the joint characteristic function <&x(ii5^2) • • • ,tn) = E[exp(zt.X)] to get: 



(11.2) 



This way of computing the sum corresponds to a direct aggregation of the risks, i.e. to a 
one-level tree (hereafter also called a flat tree): 
















x,„ 




X\T 



Imagine however the case where one does not have all the information on the joint distribution 
function, but only partial information on, e.g. some joint distributions between subsets of the 
leaves. For instance, split the vector of the Xj's in two pieces (Xi, . . . , Xj) and (Xj+i, . . . , X^r), 
and define Yi = Xi + . . . + Xj and I2 = ^j+i + . . • + X^r. Then, as a matter of fact, the 
characteristic function of Z can also be written as: 

^zit) = E[exp(it(J2x^))] =E[exp{{i{tY^ + tY2))] = ^Y^,Y2{t,i) ("-S) 

where we introduced the characteristic function ^YiXii^'i-^^'^) ^^^^^ describes the joint distri- 
bution of the variables Yi and Y2, which are partial sums of the leaves. This shows that the 
distribution of Z might be computed even if the joint distribution of X is not known. Here it 
would be enough to know the joint distribution of Yi and Y2. Of course we could have split the 
sum Z in more than two terms. This way of computing Z, via a decomposition of the sum into 
pieces for which one does have some information exactly corresponds to the (intuitive) idea of 
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aggregating risks in an hierarchical tree^. In the above example, the tree is rather simple with 
two levels: 




^1 




X... 








Xj+i 




X... 




Xn 



Of course the procedure could iterate to fit more complex aggregation trees. For example the 
joint distribution of the (Xi, . . . ,Xj) might be unknown (and so would be Yi), and one should 
therefore split further this vector, as, e.g. Yi = Wi + W2 with Wi = (Xi + . . . + Xh) and 
W2 = {X^^i + . . . + Xj). Then Yi is calculable, provided the joint distribution of Wi and W2 
is known. 

B. Bottom-up point of view: tree structure and risk aggregation 



As discussed in the introduction, in risk management analysis we are more concerned with a 
"bottom-up" point of view, where the marginals of the individual risks are usually known (from 
historical data, modelling, or any other methods), but their joint distribution is poorly known. 
Copulas, as we argued in Introduction, are well-known and powerful tools that enable to define a 
joint distribution function from given marginals [4, 5j. Copulas originate from Sklar's theorem 
|12) . which essentially states that for any continuous distribution function F with marginals 
Fi, . . . , Fd, there exists one and only one function C : [0, 1]"' [0, 1] such that F can be written 
as 



F{xi,...,Xd) = CiFi{xi),...,Fdixd)) 



(11.4) 



One is therefore able to construct dependency structures (joint distributions) starting from 
some given marginals Fi and using appropriate functions C (there are conditions on C for it 
being a copula, see e.g. [5]). 

In the bottom-up point of view, one first split the leaves Xi in r disjoint sets, on the basis 
that (e.g.) those risks belonging to the same set share some common features between each 
other, and therefore are correlated. Then one joins together the leaves in these sets via some 



^ Except that the point of view is top-down, whereas aggregation intuitively is bottom-up, see next subsection. 
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multidimensional copulas Cg for s G [l,r]. This, according to Sklar's theorem, defines their 
joint distributions. One thus ends up with r multivariate distributions, and the partial sum of 
these risks Yi, . . . ,Yr can be computed directly. Next level of aggregation consists in identifying 
again disjoint sets in the set Yi, . . . , Yr, and proceeding similarly thereafter. Aggregation stops 
when the last copula ties together all the partial sum. 

It should be clear to the reader that the above bottom-up procedure fits perfectly well with 
the previous discussion (top-down approach), because once the copula is given, so is the joint 
distribution, and so is also the joint characteristic function. Let us however emphasize again 

that the procedure does not assume the knowledge of (nor specify) the full joint distribution. 
As explained, this does not spoil our ability to compute the distribution of the total portfolio. 

C. Regular Hierarchical Trees: Definition and notations 

In the following we will refer to trees in the usual sense of graph theory. Moreover we will 

restrict ourselves, for simplicity, to "regular" trees. 

Definition 1 (Regular Hierarchical Trees). Let k >2 and m> 1. A tree is said to be a {k,m) 
regular tree if each node of the tree except the leaves have a common number k >2 of children. 

Each layer (or level) of the tree is labeled by an integer p ranging from to m. By definition 
the root corresponds to the level while the leaves populate the level m. 

The number A/'^p) of nodes at level p is therefore N^^^ = k^, and the total number of leaves, in 
particular is iV(^) = k"^. In all the following, super or lower script in parenthesis (m) or (p) 
will always label the layer of the tree. 

In our construction, the tree is meant to be a useful graphical representation of random variables 

and their relationship. Therefore we also need to assign random variables to the nodes of the 

tree. We adopt the following conventions: The leaves of the tree arc the individual risks. They 

are represented by random variables denoted by X^"^\ for i € [1,A;"*]. Similarly, at each level 

(p) 

p, the nodes of the tree represent random variables denoted by ' . 

Because of the aggregation mechanism, all these random variables are not independent from 
each other. At the contrary, any random variable associated to a node is given by the sum 
of the random variables associated to its child (except for the leaves). Moreover, hierarchical 
aggregation, as discussed in the introduction, also means assuming some dependencies between 
the nodes via the use of multivariate copulas (here fe— dimensional copulas). Hence we have the 
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following definition: 

Definition 2 (Hierarchical Aggregation Mechanism). Let T be a {k,m) regular tree as defined 

(p) (p) 
above. Let p ^ m, and let be the set of child of the node . Then, the aggregation 

mechanism refers to the following. For all p ^ m and relevant indices i, the k nodes which 
(p) 

belong to J^' 



1. Are joined together via some k— dimensional copula C according to Eq. (IL4) 

2. Are summed up to form their parent: 

Applying iteratively the above formula enables one to cascade down the full tree so that, in 
particular, the root of the tree corresponds to the full portfolio Z = x[^^ = X^j^j-™^- 

III. DIVERSIFICATION BENEFIT AND RISK MEASURE: DEFINITIONS 

This section provides precise definitions for the diversification benefit. As discussed briefly in 
the Introduction, this requires a risk measure, see e.g.^. We advocated briefly the use of the 
Expected Shortfall or Tail- Value- At-Risk at some threshold a. Actually we will use a slightly 
different risk measure. We will consider the xTVaR, which is difference between the TVaR 
and the expectation value, because it is more representative of the value-at-risk for a company. 
Thus, xTVaRa = E[X] — TVaRQ[X] for a random variable X, whereas, as usual, the TVaR is 
defined by 

TVaR„[A:] = - [ YaRu[X]du (III.l) 
" Jo 

when it exists, and where VaR^ is the Value-At-Risk at level u. Typically we choose a = 0.01, 



although the results for the Gaussian Tree will not depend on a, see Section IV 



We then define three sums at risk, two of which correspond to fictive extreme cases (zero or full 
dependency between the risks): 

1. The sum at risk "standalone", which is the sum of all the susm at risk for each individual 
risks Xf. S\ = xTVaRa(Xi). This is also the sum at risk of the full portfolio when 
all the individual risks are fully dependent between each other. 
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2. The actual sum at risk: Sz = xTVaRQ,(Z) = xTYaRaCl2i-^i)- This quantity depends 
on the dependencies that exist between the individual risks. 

3. The sum at risk while assuming that the individual risks Xi are independent from each 

other: = xTVaRQ(^)| no dependency 

Clearly the actual sum at risk Sz can range from 5^ (no dependencies, maximal diversification, 
minimal sum at risk) to (full dependency between the individual risks, zero diversification 
and maximal sum at risk). Therefore we have 5"^ < 5*2 < and this suggests to define a 
dimensionless coefficient rj, as 

, = |F|f (III.2) 

This parameter, that we call the diversification factor ranges from to 1 and measures how 
close is the actual portfolio to the zero dependency case (high diversification and ~ 0) or to 
the full dependency case (low diversification and r/ ~ 1). Yet another instructive quantity is 
the diversification benefit itself, DB, which is usually defined as: 

DB = 1 - 1^ (III.3) 

IV. DIVERSIFICATION IN HIERARCHICAL GAUSSIAN TREES 

In this section we study and solve completely the hierarchical regular Gaussian tree, compute its 



diversification, and study its behavior with respect to the shape of the tree. Section IV A precises 



the setup. Section IV B gives the total portfolio as a result of the aggregation. Section IV C gives 



the diversification of the Gaussian Tree, and Section IV D discusses the influence of the shape 
of the tree on diversification. 

A. Definition of the Gaussian Tree 

We study a regular {k, m) aggregation tree as defined in definitions [T] and [2] We further simplify 
the problem by assuming the following: 

Assumption 1 (Individual risks are Gaussian). The individual risks at the bottom of the tree 
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are all modeled with the same univariate Gaussian with zero mear? and variance a? 

(m) 

yi, x("^)~AA(0,af^)) (IV.l) 

Assumption 2 (Gaussian copula is used). Any aggregation step (see definition^ consists in 
tying together k nodes via the k— dimensional equicorrelation Gaussian copula of dependency 
parameter p. This means that the k x k correlation matrix defining the Gaussian copula reads: 

^ = lk + p{Jk-tk), (IV.2) 

where 1^ refers to the identity matrix of rank k, Jk is the k x k matrix everywhere filled by 1, 
and the condition — < p <1 is assumed to hold for definiteness. 

Appendix [A] provides some useful and well-known material for handling multivariate normals 
and gaussian copulas. 

B. Result of the aggregation process 

The aggregation process in the gaussian tree results in the following: 

(p) 

Theorem IV.l. Any node of the regular gaussian tree is an univariate normal ~ 
^"(0, cj^p^) with zero mean and standard deviation a{^p^ given by: 

a^p)=a^m){k + {k'-k)py"'-^^^' (IV.3) 

In particular the total portfolio Z is Z ^ -^(0, o"^) with 

az = c7(^m){k + {k'-k)p)'^'^ (IV.4) 

Proof: Let us start with the first aggregation step. Let N = k"^. At the bottom of the tree, 
there are N/ k groups of k leaves that are tied together via the equicorrelation gaussian copula. 
Consider for instance the first group x'^j^\ . . . ,xjj^\ By definition of Gaussian copulas, these 
k univariate normals which are tied together via a Gaussian copula with correlation matrix S 
have a joint distribution which is a /c— variate normal with covariance matrix cj^^-j x S (see 



Appendix A|). Then it follows from corollary A. 2 that the sum 



is an univariate normal with variance given by Eq. (A. 5) and a mean given by the sum of the 



^ The means are not relevant since we use the xTVaR risk measure and can therefore be set to without loss 
of generality. 
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means (hence zero here). Applying the result to the equicorrelation matrix defined in Eq. (IV. 2 ) 
shows that the sum is given by 

with 

1 /2 

o-(^_i) = o-(„) {k + _ k) p) (IV.6) 

This reasoning holds for all the other groups of k nodes at the bottom of the tree. The level 
(m — 1) of the tree is thus completely determined: it is populated by the ^j^™ for i G /c™~^, 



which are all univariate normal with zero mean and variance given by Eq. (IV.6 ). By definition of 



the aggregation mechanism, the process then repeats to upper levels. The theorem thus follows 



from a trivial iteration, and by solving the recurrence equation for the variances Eq. (IV.6) □ 



Diversification in Gaussian Trees 



Based on theorem IV. 1 we compute the three sum at risks 5*0, Sz and Si as defined in Section III 
and then compute r/ and DB. The full portfolio is univariate normal with zero mean and 
standard deviation az- A well-known result states that the TVaRo, of univariate normals 
M{fi,a'^) with mean fi reads TVaR^ = /i + f{a)a for some function /, see e.g. As a 

consequence the xTVaR simply reads f{a)cr. 

It then follows that the sum at risk Sz reads f{a)az- Taking the limit for p — )• and p — )• 1, 
we finally get: 

S'z = fia)a^m)k"'/^ 

S^ = f{a)a(^^){k+{k'-k)p)^^^ 

One easily derives the diversification benefit 

DBik,m,p) = l-(^+(l-^)p) ' (IV.7) 



and the diversification factor 



()t + (fc2_A;)p)'"/2_^W2 



as functions of k (width of the tree), m (depth of the tree), and p the dependency. We also 
recall that the number of individual risks is given by = A;*". Notice that /(a) has dropped 
from the final result. 
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D. "Thin" trees diversify better than "fat" trees 

We now discuss the behaviour of the diversification as a function of the shape of the tree. Next 
figure Fig.Q shows a family of curves that represent the behaviour of r/ as a function p, for 
different values of m but a fixed A: = 3. Notice that the number of leaves is not constant from 




0.2 0.4 0.6 0.8 1.0 



Figure 1: Typical behaviour of rj (y-axis) with p (x-axis) in a (k,m) Gaussian tree. Here k is set to 3 
and TO ranges from 1 (top curve) to 10 (bottom curve). The to = 1 case corresponds to the flat tree 
with a concave curve for rj. However, the curves become convex for m > 3, hence giving a small 77 even 
if p is large. 

one curve to the next. Observe in Fig. ([T]) how increasing the depth of the tree makes the curves 
for r/ more and more convex, whereas the curve for the flat tree m = 1 is concave. 

This means that, for a given p, the hierarchical tree diversifies much better than the flat tree 
(since rj is smaller). Moreover convexity implies that in hierarchical trees the diversification 
remains high even for large values of p. 

One may object that this simply comes as a consequence of increasing number of individual risks. 
However, we now show that the shape of the tree does also play a role per se in the diversification 
of risks. Whenever is fixed, m and k are related through = fc™. Qualitatively speaking, 
therefore, the difference between two trees {k,m) 7^ {k',m') is a sort of opening angle, and we 
have the following result: 

Proposition IV. 2 (Thin trees diversify better than fat trees). Let {k,m) and {k',m') such 
that /c'™' = fc™ = iV. Then DB' < DB if k' > k. 

Thus the diversification benefit is smaller for trees with larger width k. The proof is straight- 
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forward. One writes the diversification benefits as DB' = 1 — q' and DB = 1 — q and forms the 
ratio 

= ? ('° - ^' + ") - (f<' - ^) + ")) c^-^' 

which shows that q' > q when k' > k O 

The following example illustrates this. We take e.g. p = 0.4, and k = 2, m = 10, N = 1024. 
Then we find DB ~ 83% and ry ~ 0.14. However, with fc = 4, m = 5, = 1024 we find 
DB « 77% and ?y ^ 0.2. 

To make some contact with the numerical studies done in SectionjVj next Figure|2]also compares 
the diversification benefits for the flat Gaussian tree {k = 729, m = 1) and for some hierarchical 
Gaussian tree (A; = 3,m = 6). As Figure |2] shows, the hierarchical tree diversifies much more 
than the flat tree. 
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■ Hierarchical Tree 
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Figure 2: Diversification benefit in percents as a function of p for both the flat and hierarchical Gaussian 
trees with parameter as given in tlie text. 



This result regarding the influence of the shape of the tree on its diversification shall not come 
as a surprise. As discussed briefly in the introduction indeed, it is quite intuitive because 
individual risks are typically dependent on each other via some upper node in the tree, i.e. 
after several times the application of the Gaussian copula. This has the tendency to lower 
the effective correlation between these risks, and this happens more often in thin trees than in 



fat trees. Equation (IV. 7) illustrates this with the emergence of powers of p in the result for 
diversification. 

The above discussion cannot be more than qualitative. Indeed the joint distribution of the 
leaves is not completely known and we cannot compute the correlation coefficients between 
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distant individual risks. Section IVTl will however show that when one adds some conditions that 
enable to specify fully the distribution, then one is able to compute these effective couplings, 
and show that they decrease with the distance between the individual risks in the tree. 

V. DIVERSIFICATION IN HIERARCHICAL LOGNORMAL TREESS 

In this section we add to the full treatment given in the Gaussian case some numerical results 
on LogNormal trees. By this we mean hierarchical regular trees as defined previously, but the 
individual risks are now described by the some common LogNormal distribution. Our aim is to 
see whether the previous conclusions in the Gaussian case also hold in more realistic'^ settings. 

A. Methodology and parameterization of the tree 

In this section we set k = 3 and m = 6, i.e. we consider a tree which aggregates together 
= 3^ = 729 individual risks. The aggregation has been done by means of a Gaussian copula 
of parameter p, and also with a Clayton copula of parameter 9. Diversification benefit DB and 
diversification factor r] have been evaluated numerically with a threshold a = 0.01, using the 
IGLOO software [13j. Results are systematically compared to the diversification of a flat tree 
k = 729, m = 1. 

The results show that the hierarchical tree diversifies much better than the flat tree. Moreover 
results show that the lognormal hierarchical tree also has the tendency to make more convex the 
curve for the diversification factor rj, especially for Gaussian copulas, see graphs below. This 
establishes the relevance of the Gaussian tree as a toy-model of aggregation trees. 

The parameters of the LogNormal have been chosen in order to fit realistic portfolios. Explicitly, 
by imposing a mean of around 670'000 and a standard deviation of 8.1 millions for these 
LogNormals, then one finds that the maximal sum at risk (full dependency) reads = 22 A 
(in bn), and the minimal sum at risk (zero dependency) is approximately = 1.3 bn. 



* LogNormal distributions are widely used in risk modelling in insurance companies, see [S] 
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B. LogNormal Tree: Gaussian copula 

The dependencies are set by a Gaussian copula with dependency parameter p. We computed 
numerically the sum at risk Sz as a function p, for both the (A; = 3, m = 6) hierarchical tree 
and the flat {k = 729, m = 1) tree. 

Fig. (|3]) shows the sum-at-risk as a function of the dependency parameter p. Similarly Fig. Q 
and Fig. ([5| respectively show the behaviour r] and DB. We see that, as in the Gaussian tree, 
the following holds in the lognormal tree with Gaussian copula: 

• The sum-at-risk stays almost constant for small values of p in the hierarchical tree. 

• The curve for the r] becomes convex when adding new layers to the tree. 

• The diversification benefit stays relatively constant for small values of p. 



These properties all show that the successive applications of the Gaussian copula in the hi- 
erarchical tree almost completely suppress the dependencies for not too large values of p. 
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Figure 3: LogNormal Tree with Gaussian copula. The xTVaR at 99% of the full portfolio, in billions, 
as a function of p. This interpolates from S'^ = 1.3 to = 22.4 bn. 



C. LogNormal Tree: Clayton copula 

We performed the same analysis but using this time a Clayton copula of parameter 6, and 
varying its strength. Figures ([6j [7] and [8| respectively show (smoothed curves of) the sum-at- 
risk, ry and DB as a function of the Clayton parameter 6. 
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Figure 4: LogNormal Tree with Gaussian copula. The factor 77 as a function of p. It is essentially a 
rescaling of the previous curve. 
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Figure 5: LogNormal Tree with Gaussian copula. The diversification benefit in percents as a function 
of p. 

We note that the general shapes of the curves are rather different from the ones found using a 
Gaussian copula. The curves for the flat tree are much more concave (for the sum at risk and 
rj) and the hierarchical aggregation mechanism fails partially to turn it into a convex curve. 
Still, the hierarchical tree diversifies much better than the flat tree. In other words also, the 
diversification benefit fails to stay almost constant for some range, meaning that the hierarchical 
tree is not able to balance completely the high dependency set by the Clayton copula. 
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Figure 6: LogNormal Tree with Clayton copula. The xTVaR at 99% of the full portfolio, in billions, as 
a function of Clayton parameter. 




Figure 7: LogNormal Tree with Clayton copula. The factor r/ as a function of Clayton parameter. 
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Figure 8: LogNormal Tree with Clayton copula. The diversification benefit in percents, as a function 
of Clayton parameter. 
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VI. SPECIFYING FULLY THE TREE 

We saw in Section [IT] that the hierarchical aggregation mechanism does not fully specify the 
joint distribution but rather provides minimum amount of information to determine the full 
portfolio. 

A. Conditional Independent Trees 

Inspired by |ilO, Chp.8], we now add some conditional independence statements to the regular 
Gaussian tree and show that it enables to specify the joint distribution of the leaves of the tree. 

Let us note X <Y whenever X is a child of Y. This defines a partial order relationship in the 
tree. Then we define: 

Definition 3 (Conditional independent trees). A tree T is said to be conditional independent 
iff for all X,Y,S £T such that X <Y and S ^Y , then {X X S)\Y . 

Note that {X ± S)\Y ^ {S ± X)\Y . Then we have the following theorem. 

Theorem VI.l. There exist one and only one regular {k,m) conditional independent Gaussian 
tree Tk^m- The N = k"^ leaves of the tree are jointly normal with covariance matrix C^'^^ given 
by the following recurrence relations: 

= (4) 



And, Vm > 2; 



a 



(1) 



P 1 
\P ••• 



uj (m-1) 



(m) 



B 



(m-1) 



B 



(m-1) 



P 

V 



Bi 



(m-1) 



^'(m-l) 

^(m-l) 
(m-l) , 



where C^"^^ is here written as a k x k block matrix, with blocks of size k"^ ^, and where 
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The recurrence relations between the variances is: 

o-(p_i) = o-(p)\/A;+ (fc2 _ k)p 

The matrix is given by = /3m_i J^^m-i with 

1 / 1^ 



Proof. See Appendix [B] 



B. "Effective" correlations 



Results on the diversification in the Gaussian tree in Section IV made clear that applying 
successive Gaussian copula of parameter p in the aggregation mechanism implies smaller and 
smaller correlation coefficients between nodes that are not directly connected. This can now be 
given an explicit formulation within the context of the conditional independent Gaussian tree. 

Proposition VI. 2. Two risks X^^™"^ and xj™'^ which are first connected to each other via a node 
located at level p have between them the linear correlation coefficient p^"^'P'> = cov{xl"^\ Xj"^^) 
with: 

p'-'>=p{-^ + {l-^)p) (VI.1) 

Proof. See step 3 of Appendix [B] The smallest correlation coefficients in the covariance matrix 
are of course between the most distant leaves, i.e. between leaves that are connected to each 
other only via the root of tree (p = 0). This explains why the overall diversification benefit can 
be large (or that the r] factor can be small) even if the dependency parameter p is large (see 
also Fig.([T|), whenever m is large enough. 



VII. CONCLUSIONS 



We provided a self-contained introduction to hierarchical aggregation of correlated risks within 
trees of aggregation. We discussed the advantages of such a method, discussed its relevance for 
financial industry and in particular insurers and reinsurers. The mathematics underlying the 
definition of a tree as a useful graphical representation has been. 

Thanks to an completely solvable model for aggregation trees, namely the Gaussian tree, we 
showed how the shape of the tree, via its impact on the dependency structure, influences the 
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resulting diversification benefit. In particular we showed that the depth of the tree, or more 
precisely its thinness, heavily influences its diversification. 

More precisely, one of the main conclusion is that hierarchical trees have the tendency to 
lower effectively the dependency set at each step of the aggregation. As a consequence the 
diversification benefit stays relatively large and constant for small dependency parameter, and 
starts decreasing only for high values of the dependency parameter. 

This should be understood as a systematic effect that should be kept in mind while modelling 
risk aggregation: although an high dependency is applied at each aggregation steps, the diversi- 
fication might stay relatively large depending on the shape of the tree. This represents a risk for 
modelling: inadequate shapes of aggregation trees might lead to a large over/under estimation 
of the diversification gain. 

The only way to avoid these systematic effects again relies on an adequate modeling of risks. 
As was already the case when we considered the non-trivial role of the order in which the risks 
are aggregated, it is critical to stay as close as possible to the business. The shape of the tree 
shall not be arbitrary nor based on oversimplifications. 

Of course real portfolio analysis is plagued by the fact that due to their complexity, there are 
always modelling choices ambiguities. A good practice, although much time consuming, would 
thus be to design and calibrate several distinct aggregation trees corresponding to different 
modeling choices, in order to get a better picture of the landscape of diversification. 

In the last part of this work, we also showed how the joint distribution of the risks can be entirely 
determined in the Gaussian case by adding other constraints to the aggregation process, namely 
conditional independence statements between nodes in the tree. Further work would be needed 
to understand whether the set of constraints that we used is the minimal set that enables to 
specify completely the joint distribution. Also, it would be worth studying whether this holds 
as well for more general (non-Gaussian) trees. 
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Appendix A: Multivariate Normals and Gaussian Copula 



Definition 4. A random variable X is said to be univariate normal, or normal for short, if its 
density function reads: 



fix) 



1 



(271^2)1/2 



exp 



2^2 



(A.l) 



and we note X ~ A^(/^, o"^ 



Definition 5. A random vector X = (Xi, . . . ,Xn) is said to have a non-singular multivariate 
normal distribution iff there exists a symmetric positive-definite matrix C and a vector fx such 
that the joint density reads 



(A.2) 



where x and fi are vectors in 



\XN J 



and where the prime indicates the transposition. 



Observe the boldface notation that, as usual, denotes vectors (and sometimes matrices as well). 
The matrix C = cov(X) is known as the covariance matrix, i.e. a N x N matrix whose elements 
are dj = cov{Xi, Xj). 

Theorem A.l (Linear combinations of multivariate normal). See e.g. Q Chp.3] 
Let X = (Xi, . . . , Xn) be any multivariate normal distribution of dimension N, with covariance 
matrix cov(X.), and let A be a k x N real matrix A. Then 

Y = AX' 



is also multivariate normal of dimension k with k x k covariant matrix given by: 



cov{Y) = A cov{X.)A' 



(A.3) 



As a consequence we have the following: 

Corollary A.2 (Sums of Gaussians that are jointly normal). 5*66 e.g. f^, Chp.3] 
Let X = (Xi,...,Xjv) be any multivariate normal distribution of dimension N, as defined 
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above, with covariance matrix Cij and mean jjL. Then the sum of these Gaussian marginals is 
itself univariate normal. Defining Z = Yl^i=i -^it have 

Z^M{fiz,al) (A.4) 

with 

N N N 

(^z = '^'^Cij and fj.z = '^fj,i (A. 5) 

i=l j=l 1=1 

We also recall a standard result on conditioning on multivariate normals. 

Theorem A. 3 (Conditioning on multivariate normals.). See e.g. 14' Chp.3] 
Let X = {Xi, . . . ,Xj\f) be jointly normal with covariance matrix C and mean fx. Let fx and C 
be partitioned as follow: /x = (/^i, fj.2) with respective size q and N — q, and accordingly 



C 



Then provided C22 is non-negative X1IX2 = X2 is multivariate normal with mean fi = fii + 
C12C22 {^2 — M2) O'lT-d covariance matrix C = Cu — Ci2C22^C2i. 

As a consequence we have the following 

Corollary A.4. Let X = {Xi, . . . ,Xjv) be jointly normal with covariance matrix C and mean 
fx. Let Y = Xi. Then X|y = y is multivariate normal with mean fi = ^^+C\2C22^ f^i) 
and covariance matrix C = C — C^2'^22'^'^2i' where CI2 is a N column vector of elements Cij 
and CI2 = Eijdj- 

Proof. First consider the vector x = {Xi, . . . ,Xj\f,Y). Note that it is also given hy x = M.X 
where M is a (N + 1) x matrix (the upper N x N matrix is the identity matrix, and the 




last line of M is full of 1). By virtue of theorem A.l it implies that x is a A^ + 1 multivariate 
normal with mean given by fx* = M/i and covariance matrix C* = MCM'. One computes 
t^* = (m. and 



C* 



c c 



12 



'•-21 '-22 > 



Note that C22 ^ because C is non-negative. Therefore we can apply the previous theorem 
A. 3 to show that X|y = y is A^— dimensional normal with fx = ^ + C\2C-22^{y ~ X^/^i) and 



— U '-12'-22 "-21 '— ' 
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Definition 6 (Gaussian Copula). Let $s be the cumulative distribution function of a k—variate 
normal with zero mean and covariance matrix S, and let ^ be the cdf of a standard normal. 
Then the Gaussian copula with correlation matrix S is the function: 

C§{n) = a>s <^-Huk)) (A.6) 

In the case where S reads 2 = 1^ + Pi-Jk ~ Ifc)) i-^- 

fl p ••• p\ 
P 1 

• ■ ■ ' ■ I 
\p ... P 1/ 

we will refer to the equicorrelation Gaussian copula of parameter p. It is also an exchangeable 
copula, meaning that is invariant for any permutations of the Uj's. 



Appendix B: Proof of Theorem [Vl7l] 



Given any k >2, the proof is by strong recurrence on m. Initialization (m = 1) is clear (see e.g. 



proof of theorem IV. 1 ). The basic idea behind the proof by recurrence is to note that „i is 



obtained by aggregating together k times the subtrees Tk^m-i for which the theorem is assumed 
to hold. We have then: Tk^m = 




Note that in this approach it is necessary to rename accordingly all the random variables, 
variances and so on attached to the subtrees (e.g. cr(m-i) of subtree becomes (T(m) of the 
full tree, etc). Also, for more clarity, we denote in this proof the roots of the subtrees by Y: 

Step 1. Characteristic function and joint normality of the leaves We first compute the charac- 
teristic function of the leaves and show how conditional independence (CI) statements enable 
its calculation. This will show that the leaves are jointly normal. Let t E R^, = A;™ and 
X(^) the vector made of the A^-^^'s. Then: 

$X(„,(t)^E[exp(it'X(„))] 
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We note the leaves X!^' such that X^' <Yi ior i E [l,k], and we note Ci = {X!^' < ^} 
these sets of leaves. Then the scalar product t'X(^) is naturally split in k sums according to 

these sets d: 



.(m) 



i=k 



t'X(„i) = ^ t -X. 



(m) 



where the notation is self-explanatory. Then 



*X(^)(t) = E 



'i=k 



nexp(zt^xS-)) 



.i=l 



that we now condition on all the roots of the subtrees using the law of total expectation: 

Vi=k 

^X(^)(t) = E' 



E 



nexp(zt^X;-))|Y(i) 



i=l 



By virtue of conditional independence, we however have V(u, t;) € x £j, the property: X^^^ 
xl".\Yi and therefore, Vt',Vz / j, t^X^^™^ X t^-Xj^^ly^ and finally 

vt',t;xf)xt;.xf^|Y 



This enables to factorize the characteristic function as^ 

'i=k 



nE[exp(zt',xf))|Y' 
.1=1 



Others conditional independence statements of the form 

where Y_j is the vector Y with the i*^ element removed, enable ones to remove redundant 
conditioning in the expression for the characteristic function, and one finally gets: 

'i=k 

^X(^)(t) = E 



nE[exp(zt^X^))|y, 



which is now calculable, because E 



i=l 



(B.l) 



is a known random variable thanks to 



recurrence hypothesis (it indeed involves only the properties of the subtrees, which arc assumed 
by recurrence to be jointly normal), whereas the second expectation is between (products of) 
random variables related to the Yi only, and which are, by hypothesis, univariate normals 
aggregated together via a Gaussian copula. We have 

$X(^) (t) = / /Y(y) II (J d^i exp(zt^x,)/ (x,|Fi = yi)\ dy (B.2) 

i=l ^ ' 



^ liX ± Y\Z then E{XY\Z) = E(X |Z)E(F|Z). 
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where y = (yi, . . .y/.), /yCy) is the joint density between the random variables Yi, and where 
/ (xj|li = yj) is a conditional density function. 

The leaves xj™"^ for j G £, are jointly normal by recurrence hypothesis and Yi = Xj^'K As 
shown in Corollary A. 4 this implies that X^"^^|li = yi is multivariate normal and / (xj|li = yi) 



is its density. As a consequence, the multiple integral over Xj leads to a conditional characteristic 
function of a multivariate normal: 

j dxiexp(it-Xi)/ (xiiy; = yi) = exp ^it-/ij - ^t-Sjt, 



where /ij, as shown in A. 4 is affine in yi whereas the covariance matrix Sj does not depends 
on yi. In fact, because the means of the leaves are zero, /ij is even linear in yi: there exists a 
column vector Bj such that /ij = Bj?/i. 



Replacing in Eq. (B.2), and factorizing, we get, defining Si = t^Bj and s = (si, . . . , s^)': 

=k / , \ \ ^ i=k 



^X(„)(t) = |^nexp(^-^t^S,ti^j .y']^dyjY(y)exp(it^Bi^ 
= exp f -^^t-Sjti j y dy/Y(y)exp (is'y)) 



4 = 1 

i=k 



exp ^-^^t^Sitij exp ^-^s'Ss^ 
/ ^ j=fc 1 \ 



2 ^ ' 2 

\ 1=1 

3 1 / . 

= exp ( — -t At 

(B.3) 

Where we used that the integral in the second line is the characteristic function of the jointly 
normal distribution of Y, and therefore there exists some matrix S such that the integral take 
the form in the third line (the means of the Yi are zero), and where, in the last line, the existence 
of a constant matrix A is readily proven, noting that 



Aj = dtA,{-^n'^^,^^{t)) 



and that (see the fourth line), the form is not more than quadratic in ti's because the Sj's are 



linear combinations of the tfs. 



This proves that the existence of a matrix A such that the joint characteristic function is 
an exponential of the quadratic form defined by A. Moreover, this particular form of the 
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characteristic function implies that A is also the covariance matrix of X, and therefore is non- 
negative definite. As it is well-known, a characteristic function for X of the form above with a 
non-negative definite matrix A implies that X is jointly normal. □ 

Step 2. Explicit calculation of the covariance matrix The matrix A could be computed directly 



by an explicit evaluation of Eq. (B.3) above. There is however an easier route to do so by 



exploiting the many symmetries there are in the problem. 

The block diagonal terms of the covariance matrix are easily derived. Consider the leaves 
that are descendants of Yi for a given i. By recurrence hypothesis, these leaves are jointly 
normal with a covariance matrix C*^™"^^ as given in the theorem, except the necessary change 
in notations explained at the beginning of the proof. This holds for every i, hence showing that 
the covariance matrix C^™^ must have the block-diagonal structure given in the theorem. 

Off-diagonal block terms: For any two pair i ^ j £ [1, fc]^ we compute the covariance matrix 
elements^ cov(X^™\ X^™"^) for {u,v) £ Ci x Cj. These elements of the covariance matrix 
corresponds to the elements found inside the i,j block of the covariance matrix when written 
in k X k blocks of size k"^~^. Applying two times the conditional independence as done in step 
1, we have ^, using that the means vanish: 



= E[E[xi:;:^\Y,,Y,]E[xif\Y,,Y,]] 
= E[E[xi^^\Ymxifm 

(B.4) 

This shows again how CI enables to compute these off-diagonal terms of the covariance matrix. 
Indeed E[Xu"^^|l^] depends only on the subtree, whereas the expectation of the product of 
expectations in last line only depends on the last aggregation step from level (1) to the root. 

More precisely, E[xi'"^|l^] could be explicitly computed using similar techniques as the one 



found in Corollary A.4|and using the recurrence hypothesis. It is however not necessary. Indeed 



it is enough to notice that, because the leaves are all identical and because the exchangeable 
(ie. with equicorrelation matrix) copula is used, E[X^"^^|li] cannot depend on u nor on i. 



To make some contact with the step 1 above, the off-diagonal elements clearly come from the linear relation 



between s and t, see Eq.(B.3l 
^ I.e. using first the total law of expectation, and conditioning on the roots of the subtrees in question, applying 
afterwards the conditional independence hypothesis with e.g. X Xl"^^\Yi. Finally we apply another 

conditional independence statement: X Yj\Yi implying that ^Xl"l^\Yi,Yj] = E[Xi"l^\Yi\. 
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Similarly the exchangeable copula used at last aggregation step does not introduce any asym- 
metry between the roots of the subtrees Yi and Yj, and therefore cov{X^\ X^"^^) must be a 
constant for all i ^ j and for all u,v. We denote the off-block-diagonal elements of C^™-* by 

and any i ^ j off- diagonal blocks by 

The last task is to compute fim-i- First we note that X^^\Yi = yi is a Gaussian of mean yi/N', 
where A^' = k"^~^ is the number of leaves in the subtree Y^. This is clear by symmetry, but can 
also be proven® directly. Then we have that 

where, as already argued, £[1^1^] is independent of the pair i ^ j because the Yi have all 
the same marginals and are aggregated via an exchangeable copula. Since = ^ Yj^ + 
J2i j noting that there are k{k — 1) terms in the double sum, we get 

mr,] = (E[Z2] - kE[Y^]) 

Because the means vanish, we note that E[Y^] = a"^^^ and E[Z^] = a^. The recurrence hypothe- 
sis gives the relation between the variances a"^^^ = (^^ ~k)p)"^~^j and we need to prove 
that it also holds at the upper level, ie. that o"| = cr'^i-^{k + (A;^ — k)p). This is straightforward 
from the fact that Yi ~ A/'(0, c^^^ and the aggregation mechanism, and follows as a special case 



of theorem IV. 1 Therefore we get, all calculations done: 

^ 



p 



2 

(rn) 



m—1 



thus establishing the claim. □ 



Along the lines of theorem A. 3 and corollary A. 4 Choose one particular subtree, e.g. the one issued from Yi. 
Its covariance matrix is known by recurrence hypothesis. Construct the bivariate normal (Xi,Yi) and then 
condition on Yi = yi- Evaluate formula for the mean of — yi, and find jl — 'yy where 7 is the ratio 

between the sum of elements of the first line of the covariance matrix of the subtree, and the sum of all its 
elements. By symmetry of the covariance matrix given by recurrence, this ratio must be 1/N' . 
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Step 3: Exhausting the conditional independence statements We showed how to use some of 
the conditional independence (CI) statements to prove that the leaves are jointly normal with 
a covariance matrix completely determined and given as in the theorem. However the claim is 
also that any other CI statements are then satisfied. This is what we show now. 

To start with we note that because of the aggregation mechanism, any node in the Tk^m tree is 
either a leave or a sum of leaves. In any case, it can be written as a sum of leaves over a given 
set of indices: 

As a consequence, any CI statements given in definition [3] can be deduced from the following 
ones (e.g. by adding them together, etc): 

VT^ G r,Vi G JvK,Vj i Jw ■■ X'f^ X Xf^\w (B.5) 

where T is the set of nodes that are not leaves of the tree nor the root. 

This set of relations are then expressed as algebraic equations between the coefficients of the 



covariance matrix C*-™^. To this end one follows the techniques found in theorem A. 3 and 
corollary |A.4 , and construct a new multivariate normal: 



\ u&Jw / 

where M is a 3 x matrix with first line equals to 5ik with running A;, second line equals 
to 6jk with running /c, and third line is full of I's for indices in J^i/, zero otherwise. Then x 
is multivariate normal with covariance matrix given by S = MC'-'^-'M'. This matrix is easily 
computed and reads 



/ Mm) Mm) Mm) \ 

'-a ^ij ^iJw 

Mm) Mm) Mm) 

i^(m) ^(m) ^(m) i 

\ iJw jJw Jw-Jw/ 



where C^T^ = Y^,,,- r ci'"'* and C^T^K = Y^,, „c 72 ci™''. Then, conditioning on W, we have 
that Xi ,Xj\W is normal with a 2 x 2 covariant matrix S given by 

Q{m) ^{m) Q{m) Q{m) 
"Jw iJw iJw j-^W 







\- ' i 


1 Mm) 







^(m)^(m) ^(m) ^(m) 
, iJw jJw jJw jJw/ 

The conditional independence requires the off-diagonal terms to vanish. Therefore we conclude 
that all CI statements that can be made in the Gaussian ^ tree with covariance matrix C^"^^ 
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are of the following form: 

^(m)^(m) 

G r,Vi G Jw,yj i Jw.c"^^^ = 'Xf"^ (B.6) 

JwJw 

We now prove that these conditions hold. For this we compute each of these terms separately. 
We first prove proposition VI. 2 on effective couplings that we shall use below. Two nodes xf^"^ 
and xj™^ connected via a node at level p but not connected by any node at level p' > p are leaves 
of a subtree Tk,m-p for which the covariance matrix is known by strong recurrence hypothesis 
(and if p = 0, the covariance matrix has been derived in the previous steps). Moreover, as the 
node at level p is the first node connecting these two leaves, cov{X^^\x^^'^) is given by the 
off-block diagonal elements of C^™"^), hence by j3m-p-i- This establishes the claim. 

iv) 

Now, for any W there exists r,p such that W = Xr with p ^ and p ^ m. Thus W is the 



root of a Tk^m-p Gaussian tree, and by virtue of corollary A. 2 C^J^j^ = X^u„gj2 ci^^ equals 
the variance of W, which, by virtue of theorem 



IV. 1 



reads af n . Hence 

(P) 



fim) _ 2 
^JwJw ~ ^{P) 

To compute C^j^^ = X^ne Jw ^ItT^ with i G we recognize that the total number of terms in 
the sum reads A:™-p, which can be split as k'^-P = {k'^-P - /c^-p-I) + {k'^-P-^ - k'^-P-'^) + 
. . . + (/c — 1) + 1 which corresponds to the numbers of leaves having a resp. effective coupling 
with the leave i given by (im-s-i with a running s, i.e. 



_ =m— 1 



Thus, 



s=p 
s=m—p—l 

1+ p{k-i){i + {k-i)pr 

{i + [k-i)py^-p 



For j ^ Jy/, = X^ueJw ^t^^ again A;™"*' terms, which are this time all equal to each 

other, but this constant depends on the position of the leave j with respect to the set J^y, i.e. 
at which level / one finds the first node connecting these leaves (with of course I > p). Then we 
have (using again the effective correlation coefficients): 

-,(m) o , I I ^ i 
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Finally Cj'™^ is computed as is computed cj^^ but without any summation in this case. Hence: 



i{m) 
■'ij 



m—l—1 



We are now in position to form the right hand side of Eq. (B.6). For all i.e for all {p,l), 

the r.h.s. reads 

^"^^^"^ + - (i + (1 - 1) pr~''' 



C ' c 

iJw jJw 



c 



(-) . 

(k + (A;2 - k)p)"'-P 



c. 



(m) 



^im)P ( i: + ( 1 - ;^ J P 



Where the relation between variances has been used: a? 
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